In [1]:
import math
import numpy as np
# Useful function for drawing a DNA state, for instance at the root
def drawNewState():
i = np.random.randint(0,4)
if i == 0:
return 'A'
elif i==1:
return 'C'
elif i==2:
return 'G'
elif i==3:
return 'T'
else :
print("Error drawNewState")
exit(-1)
# Function to simulate along a branch using the Gillespie algorithm under the Jukes Cantor model (1969)
# In this case, we consider that there can be a mutation towards the same state.
# So the sum of the rates is 0.25*4=1.0
def simulateAlongBranch(length, startingState):
l = 0.0
current = startingState
while (l < length):
l = l + np.random.exponential(scale=1.0) # 1.0 is the sum of the rates of all possible events
if l < length:
current = drawNewState()
return current
# Compute the probability of a tree
def computeTreeProbabilityBySimulating(bl_a1, bl_c, bl_a2, bl_y):
Nsucces = 0
Nreps = 100000
iter = 0
while iter < Nreps:
x = drawNewState()
y = simulateAlongBranch(bl_y, x)
A1 = simulateAlongBranch(bl_a1, y)
C = simulateAlongBranch(bl_c, y)
A2 = simulateAlongBranch(bl_a2, x)
if (A1=="A" and C=="C" and A2=="A"):
Nsucces+=1
iter = iter + 1
return (Nsucces/Nreps)
if __name__ == '__main__':
p = computeTreeProbabilityBySimulating(0.1, 0.1, 0.4, 0.3)
print("Likelihood of ((A:0.1, C:0.1):0.3, A:0.4); for pattern {A,C,A}: "+ str(p))
print("Log-likelihood of ((A:0.1, C:0.1):0.3, A:0.4); for pattern {A,C,A}: "+ str(math.log(p)))
In common software, substitutions to the same state are not considered, and therefore branch lengths need to be rescaled. Indeed, branch lengths correspond to expected numbers of substitutions: in our case, we expect 4/3 times more substitutions than in common software. Therefore, to compare the likelihoods, we have to run common software on branch lengths that are 3/4 those of the tree, i.e.: ((A1:0.075, C:0.075):0.225, A2:0.3); When I do that, I get : loglk=-5.480185. So the Gillespie sampling approach is not too far from the Felsenstein algorithm exact approach.
In [ ]: